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ABSTRACT 

This paper identifies a feature of human brain neural 
nets that may be described as the principle of ease of processing 
(PEP), and that, it is argued, is the primary force guiding a learner 
towards a target grammar. It is suggested that the same principle 
lies at the heart of Optimality Theory, which characterizes the 
course of language acquisition as a progressive reranking of a 
hierarchy of universal and violable constraints. It is observed that 
the hierarchy a learner is in possession of at any particular time is 
the learner’s present characterization of the grammar of the target 
language and will determine what outputs nets involved in linguistic 
processing produce for any given inputs to those nets. It is 
suggested that spatial metaphors may give a clearer insight into the 
workings of neural nets, and that the process of self-organization of 
nets is seen in accordance with the PEP as a realignment of the 
positions of linguistic elements in a multidimensional space that is 
a characterization of the target language. (Contains A6 references.) 
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CONSTRUCTIVISM, OPTIMALITY THEORY AND LANGUAGE ACQUISITION - 
THE SHAPES WE MAKE IN EACH OTHER'S HEADS. 



This paper identifies a feature of neural nets which may be described as the principle of ease of 
processing (PEP) and whick it « argued, is the primary force guiding a learner towards a target 
grammar. It is suggested the the same principle lies at the heart of Optimality Theory, which 
characterises the course of language acquisition as a progressive reranking of a hierarchy of 
universal and violable constraints. It is observed that the hierarchy a learner is in possession of at any 
particular time is the learner *s present characterisation of the grammar of the target language and 
will determine what outputs nets involved in linguistic processing produce for any given inputs to 
those nets. It is suggested that spatial metaphors may give us a clearer insight into the worUngs of 
neural nets, and that we can see the process of self-organisation of nets in accordance with the PEP 
as a realignmgnt of the positions of linguistic elements in a multidimensional space that is a 
characterisation of the target language. 

1. latroilitctkiM 

Whit form does language Uke inside people's heads? How are concepts, ideas and meanings instantiated? 
How does the learner unconsciously discover linguistic rules? What is it that leads a speaker to be able to 
judge the relative grammaticality of an utterance? 

All these questions can be answered (though not yet fully) if we imagine the brain as a device that self- 
organises in accordance with what 1 shall call the Principle of of Pmeessing (PEP), in the sense of 
minimising energy expenditure in converting an input to an appropriate output. By input/output, here, 1 do 
not mean speech we hw and speech we produce, instead, I am referring to input to a neural net (from other 
nets) being converted to output which then serves as input to other nets. It will be suggested in this paper that 
the PEP is a characteristic feature of neural nets in general (biological as well as artificial), and as such is the 
primary force that drives and guides the course of language acquisition. 

2. itMy# im the emviwnmm^mt. nmd im hrmia 

We can make a clear distinction between two types of language • language that exists outside, and language 
that exists inside, people's heads (roughly, £*language and Manguage, respectively • see, for example, 
Chomsky 1996:19*24; 1995:15*17). The former exists as speech, signs and writing, the latter as patterns of 
neural activity. The former always results from the latter, and in as much as the language we read or hear (or 
see, if in the form of sign language) causes neural activity within us, we may view speech and its correlates 
as intermediaries between the neural activity of speaker/signer/writcr and an audience. Indeed, we can say 
that language as it exists outside our heads is a representation of the language inside our heads from which it 
results. 

Language inside our heads is represcnutional too; representational of the external language that causes it, 
representational of ideas and concepts in the sp;akcr/signcr/writcr that we take the external language to 
represent, and representational of ideas and concepts within ourselves that are in themselves representations 
of yet other thoughts, ideas, feelings and other menu! «‘vcnts together with representations of the external 
world of which speakers and the external language they produce are a part.' 

Language is a highly complex representational system that is an integral part of a larger, even more complex 
representational system that enables us to make sense of our experiential world (see, for example, Bickerton, 
1*^). Given that it is so complex, how is a child capable of learning the language of her linguistic 
community with such evident success? This question has long been investigated and the conventional answer 
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is that the child comes equipped with a Language Acquisition Device (LAD), which is uken to be primarily 
composed of knowledge structures (principles) that are applicable to any and all human languages (by means 
of setting parameters). These hypothesised knowledge structures arc known as Universal Grammar (UG) 
(see, for example, Chomsky, I9SI, 1986, 1988). As Hacgcman puts it: 

Given that neither formal teaching nor overt evidence seems to be the source of the native speaker’s 
intuitions, it is proposed that a large part of the native speaker’s knowledge of his language, i.c. the 
internal grammar, is innate. 

(1994: 12) 

The appeal of the UG hypothesis is strong, particularly if we accept that the learner’s ultimate knowledge is 
underdetermined by positive evidence in the primary linguistic data (PLD) (Wexler, 1991), and that the 
linguistic data we are exposed to contains a fair degree of nfli&S, in the sense of ungrammatical or otherwise 
erroneous input (slips of the tongue, false starts, etc.) that the learner should not use as a basis for 
determining the target grammar. 

3. UG aa a geaetie endowmeat 

The view of UG that seems to be commonly held amongst many of its advocates is that it is innate 
knowledge that is built in from the start (the ‘strong continuity’ view - see Hyams, 1983; Pinker, 1984), and 
that the process of acquisition is largely one of selection and elimination in response to triggers in the PLD. 
After all, Chomsky identifies UG as ‘the initial state’, or theory of the initial state (sec, for example, 
Chomsky 1980; 1995:14). This mainstream view of learning is described as regres si ve by Quartz and 
Sejnowski (1995), in that it portrays development as essentially a two-stage phenomenon, with the first sUge 
(preparing the ‘initial state’) being one of constructing a rich neural pool of prerepresenutions as a result of 
genetic and epigenetic processes, followed by a stage in which the prerepresentations are selectively 
eliminated (in response to the processing of the PLD) until a subset of the original pool is arrived at which is 
thought to underlie, though not fully characterise, adult competence. 

The conventional portrayal of language acquisition, then, is that neural structures that embody knowledge 
systems that fit the data arc selected for use, and that those that do not fit are, if not actively eliminated, 
allowed to deteriorate. This would seem to account for the apparent unavailability of UG in older L2 learners 
that is evidenced by their general inability to reach nativc-spcaker-llkc coinpctencc (as shown by both 
inferior performance and differences from native speakers (NSs) In grammaticality judgement tasks). We 
might say, for example, that the L2 learner has difficulty in determining the target grammar becauM the 
structures that underlie a characterisation of that grammar arc no longer fully available. Despite Its 
attractions, such a view is, I believe, misuken, as I shall attempt to show in the next section. 

4, Jhg reieettnn of eitrem# ttativism ifl favou r of conatructivUm 

Appealing though it is, the strong continuity view of selecting existent knowledge structures that better fit the 
PLD and eliminating their inferior rivals, is very probably not well-founded as it stands. Recent 
neurobiological evidence speaks against it, as Quartz and Sejnowski (1994, 1995) amply demonstrate. 
Unfortunately, space does not allow more than a brief outline, but the conventional view of a proliferation of 
neural growth followed by extensive axonal and dendritic arborisation (i.e. growth and branching of the 
extensions of nerve cells) and subsequent massive neuronal death is, they argue, false. So too is the 
assumption that the cortex contains a variety of structures that are, somehow, an array of knowledge systems 
which may be selected in response to the environment (including the linguistic environment) the individual 
finds itself in. This is evidenced by recent findings that the cortex ‘is largely equipotential at early stages 
(Quartz and Sejnowski, 1995: 28), the assumption being that cquipotentlality docs not equate with variety. 

What is salvageable from the UG account is that basic structures capable of forming primitive 
representations grow and develop throughout the course of acquisition, rather than a rich (and varied) set of 
knowledge structures that characterise core elements of any and all human languages being ’hard-wired’ and 
O learner from the start. Ilie ways in which the primitive structures develop arc determined not only by 



genetic factors, but by the interaction of these stnicttires with the environment. In attempting to pcocest 
information derived from the environment, neural nets reorganise (ultimately through progressive 
arborisation of both axons and dendrites, together with changes in number and position of synapck 
connections) to become structures that appropriately deal with the input they get as a result of the 
environment they are in, in progressively more efTicient ways. There will still be a degree of 'pruning* of 
inappropriate connections, but the emphasis is more to be placed on environmenully determined growth. 

Such thinking fiU well with a 'maturational view’ of UG (such as that espoused by Clahsenei a/., 1994), as 
well as with recent ideas concerning the 'initial sute* in SLA (see, for example, Schwartz and Eubank, 1996; 
Vainikka .and Young-Scholten, 1996; Schwartz and Sprouse, 1996; Eubank, 1996). If we look at language 
acquisition as largely a process of specialisation through growth, rather than a selection and pruning of what 
is already built in, the initial L2 state will have less and less in common with the initial LI state, the later L2 
acquisition begins. The L2 learner will be obliged to make use of the growth (and thus knowledge structures) 
already established by the acquisition of the LI , and interference will be inevitable. 

To reiterate, the account with which we may replace the regressive/selectionist/eliminativist/extreme nativist 
UG hypothesis is one in which the neural structures that embody linguistic knowledge develop and evolve in 
response to the processing of PLD. The principle that governs the development of such structures • indeed, it 
may be seen as the very engine of learning itself* is one of maximising efficiency in neural neU responsible 
for linguistic processing. The idea is that any net will develop a structure, through progressive self- 
organisation, that will process the types of inputs it receives to produce appropriate outputs with minimal 
expenditure of energy. 

5. Energy Sheet TopoUiyiea aad ■eural neU 

Representations (and constraints and processing propensities) in nets are Ukeu to be distributed (see, for 
example, Hinton & Anderson 1989/ 19S1; McClelland and Rumelhart, 1985, 1986; Rumelhart and 
McClelland, 1986; Churchland, 1986; Schwartz, 1988; Smolensky, 1988; Bechtel and Abrahamsen, 1991; 
Aleksander and Morton, 1993), in that they are not to be identified with the activation of single neurons. 
There are no 'grandmother cells’ (i.e. single cells the activation of which is associated with a single 
composite concept such as 'grandmother’ * see Anderson and Mozer, 1981; Churchland, 1986). How then 
are we to imagine the operation of distributed representations and distributed 'soft’ constraints? The answer 
is to think of a net as a device for converting inputs to outputs. Inputs and outputs will be distributed too, but 
we can think of an input of type x entering an energy landscape at point x (or in region x, depending on how 
specific we want our notion of a paiticular input to be) with inputs of other sorts entering at other point 

Imagine a rubber sheet that reflects the degree of energy required to convert inputs to outputs (technically 
referred to as a 'Lyapunov sheet’ in the literature, see e.g. Kosko, 1992: 77). Inputs of different kinds will 
fall on the sheet in different plKes. Outputs will be from the lowest points in areas that 'capture’ particular 
inputs. The idea of capturing leads us to speak of dips or wells in the sheet as basins of attraction . Any input 
falling within the mouth of such a basin will lead to an output of a type associated with the lowennost point 
of that basin, unless there is interference from conflicting constraints in other sheets involved in the 
processing. With a simple two or three dimensional topology, inputs and outputs are pretty much clear cut. 
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Figure la Here, the input pattern Q will move down the slope of the basin of attraction as the system settles 
and result in the output P (from Kosko, 1994: 212). 




Figure lb. Basins of attraction in an energy sheet. Imagine an input as being like a ball placed on. or falling 
onto, the sheet. Any input falling into the area of the basin will result in an output corresponding to the 
lowermost point of that basin. (From tVasserman. 1993: 18 ) 

Topologies determine processing propensities and reHect sets of ‘soft’ constraints. Simply (albeit 
inaccurately) put, if we imagine a topology in three dimensions, the peaks and slopes repf***"* °* 

constraints agLst processing, where more energy is required to convert input to output, while the valleys 
and wells represent propensities, areas where the conversion of input requires less energy. 

This is taken to reflect the fact that learning in brains is thought to reside in protein synthesis that stt'n^hens 
appropriate synaptic connections and thus cases the evocation of patterns of activation that are 
repre^ntations of what has been learnt or, alternatively, lead to replications of the behaviour that h** 
leLt (Rose 1993). In either case, we can say that the pattern of connectivity creates a topology in which 
constraints and propensities are inherent, and that this is how knowledge of any kind is instantiated. 

How true this view of learning is is difficult to say. but it is the view that is presently favoured by inost 
researchers and a great deal of evidence appears to support it (see Rose, 1993 for an overview). It is often 
referred to » the Marr-Albus theory (Mart. 1969; Albus er «/.. 19*9). which states that 
term memory (STM) to long term memory (LTM) takes place as a result of long term potentiation (LTP) or 
term depression (LTD), or a combination of the two. 

lERlC 
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LTP results from connections in a net being strengthened, making the transmission of signals from neuron to 
neuron easier, while LTD is the reverse. We can associate LTP (known to take place in the hippocampus) 
with creating valleys or wells in the topology, or deepening existing ones, and LTD (known to take place in 
the cerebellum) as creating peaks, shelves or plateaux, or raising the height of existing ones. LTP thus 
illustrates the role of the PEP in driving learning in that the strengthening of connections responsible for 
appropriate processing makes subsequent processing of a similar type easier. LTD does not, however, violate 
the PEP. It is simply an alternative method for sculpting energy •httts. 

The task of the language learner is to develop energy sheet topologies that characterise knowledge of the 
target language by making progressive changes in neural architecture. The changes made will be in 
accordance with the PEP in that the learner will maximise the extent to which linguistic constninU are 
satisfied in the processing of the PLD. 

6. Language AcqMiaition aa a aearch problem ia Hypofheais Spacg | 

I 

The course of language acquisition may be characterised as a search problem through a multi-dimensional 
space ( a hypothesis or acquisition space), each point in the space representing a permissible human language I 
grammar (Gibson and Wexler, 19^; Turkel, 1995). One of the attractions of UG and Principles and 
Parameters Theory is that the hypothesis space to be searched will be limited. One possibility is to 
characterise the learner as an error-driven hill-climber, searching for a global optimum in accordance with 
Gicedincss (see Gibson and Wexler, 1994) - moving only to positions that reflect an improved analysis of 
the dau at hand • and the Single Value Constraint (SVC) of Clark (1992) - moving to a grammar which 
differs minimally from the one presently entertained, i.e. to one that is a near neighbour in the search space. I 

The problem for such a model is that the learner (even in relatively simple, idealised, spaces) is likely to | 

encounter traps -areas of the hypothesis space which are local optima, and from which, given Greediness and ' 

the SVC, she cannot escape in that no near neighbour grammar is better than that characterised by the 
present position (see, for example, Gibson and Wexler, 1994; Pulleyblank and Turkel, in press)} \ 

i 

Various solutions may be suggested for dealing with this problem, one being U* ; ^ise of Genetic Algorithms I 

(Clark, 1992; Pulleyblank and Turkel, in press) . Whatever formulation is chosen, however, the learner needs 
some criterion by which to evaluate the superiority of a proposed position in respect to the one presently 
occupied. In fact, it is not a conscious choice on the part of the human learner, but a choice made by nets 
responsible for linguistic processing in accordance with their nature. The choice is based on the fact that nets 
self-organise by following the PEP. 

Optimal structures are those that maximise harmony , harmony being the extent to which constraints are 
minimally violated and maximally satisfied when converting input to output. If, as a result of a proposed 
change, the harmony of the system in processing the PLD is increased, the change will be adopted. This 
feature of the system is the PEP. 

Thus neU involved in linguistic processing will happen upon structures that bvtter fit the PLD in that they are 
more suited to an analysis of the daU than those the nets presently employ. In so doing they will develop 
linguistic knowledge. The greater the efficiency in converting inputs to outputs, and the greater the 
appropriacy of the conversion, the more accurate is the set of neural structures as an embodiment of 
linguistic knowledge. 

Such an amount is very much in line with the thinking behind Optimality Theory (OT), a relatively new and 
fast growing research programme within Linguistics, as well as being one which fits well with the 
neurobiological facts as we know them. 

7. Tkm Optif llfy view of Uafyay 

OT views linguistic knowledge as being insUntiated in hierarchies of a finite set of violable universal 
constraints (see, for example. Prince and Smolensky, 1993, Tesar, 1995, Legendre e/ a/., 1995). In any 
hierarchy, some constraints will dominate, or outrank, others, and thus will be respected over low ^r ranking 
Q traints with which they conflict. We may call such structures Constraint Domination Hierarchias 




116 



(CDHs). Differences between languages are thus explained by variation in the CDHs that characterise them, 
in consequence, OT views language acquisition as a process of progressive rerankIng of constraints in a 
hypothesised CDH in determining a CDH that satisfactorily accounts for the PLD of the target language that 
the learner is exposed to, together with the acquisition of the lexicon (Tesar and Smolensky, 1993). 

7.1 The tixe of OT apacea 

Because OT hypothesis spaces arc so large and complex, learning techniques based on brute force serial 
searches or chance may be ruled out - the learner would in all likelihood be exceedingly old by the time an 
appropriate CDH was happened upon. As Pulleyblank and Turkcl point out: 

In constructing a theory of parametric variation, changing the number of binary parameters from to 
N + } doubles the number of possible grammars. Adding a constraint to a system of N constraints 
results in times as many grammars. To get an idea of the magnitude of the space, consider a * 
learner which enumerates the possible grammars, and is able to test one grammar per second. On 
average, the cnumcrativc learner will have to test 1/2 of the grammars before finding the target. For a 
system of 5 constraints, the learner would take about I minute to find the target grammar. The average 
learning time goes up to about 231 days for a system of 1 1 constraints. For a system of 20 constraints, 
this learner would take about 38.5 billion years. 



(Pulleyblank and Turkcl, in press: section 4) 

In view of such considerations, the learner’s search of the hypothesis space cannot be random or 
cnumcrativc. Instead, the learning process must be principled, and the most likely principle is that which 
governs learning in artificial neural networks; the strengthening of appropriate connections and the 
weakening or elimination of Inappropriate ones, appropriate connections simply being those that more easily 
produce appropriate outputs from the inputs received. 

To an extent, such an answer does, admittedly, beg the question of how jw connections arc formed in order 
to be tested to see whether they arc appropriate. I suspect that the learner builds, by means of the action of 
intemeurons*, a number of virtual patterns of connectivity which represent CDHs that arc near neighbours of 
the present position and tests these out (i.c. secs to what extent they arc superior in analysing the daU by 
judging which ones process input based on the data with a greater degree of harmony). The learner might 
then move to any position that is found to be superior to the one presently occupied by physically 
instantiating cither connections that lead to a real pattern of connectivity that embodies the virtual pattern 
tested, or by strengthening the connections that facilitate the pattern of firing of the intemeurons that arc 
responsible for creating the superior virtual pattern of connectivity that is built on the real pattern that exists. 
We could also imagific a blending of the two scenarios with a progressive development of the real pattern 
under the Influence of the maintained success of the virtual in processing input based on the PLD. 

7,2 CDH reraakiag compared with parameter setting 

It is also likely that constraints arc not reranked Independently but as sets of tied changes; rcrankings of sets 
of related. Interdependent, constraints. Constraints, then, may be seen as coming in families. The parallel 
with the idea of parameters (particular ‘settings’ of sets of interdependent constraints) and principles 
(constraints) is striking. The fact that the position of some constraints in a CDH must be def^ndent on the 
position of certain others is obvious if we consider the fact that many constraints are conflicting - there are 
pairs of constraints in which not both may be satisfied - the satisfaction of one rules out the satisfaction of the 
other. A simple example is the constraint pair Repeat and •Repeat (do not repeat) (sec Yip, to appear, for a 
discussion of various types of repetition and its avoidance in Javanese). One cither repeats, say, a word in a 
speech stream and thus satisfies Rcpcat-w (repeat word) and violates •Repcat-w, or one doesn’t, in which 
case Rcpcat-w is satisfied and Rcpcat-w is violated. Repeat /•Repeat can never occupy the same stratum in 
a CDH; at any moment In time one must outrank the other. This is not to say that one must always outrank 
the other. Sometimes we may wish to repeat a word for the sake of emphasis (”lt’s a long, long way .’’“It’s 
very, very complex.”), at other times we may not. At least some elements of our CDH must, therefore, be 

o 
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V viable, depending on the context and what it is that we wish to express. How might such a change in CDH 
stnicture be achieved? One possibility is that we continually create a virtual pattern of connectivity in our 
linguistic nets (as suggested above), and that this pattern of connectivity embodies the CDH of the moroent 
This virtual pattern may be varied in some respects, at any given moment, through the intervention of 
intemeurons acting in a way similar to so-called Sigma*Pi units as outlined in Rumelhart and McClelland 
(1986: 73/74). Admittedly, things are unlikely to be quite so simple, and it is more probable that changes are 
effected in a number of different ways by many groups of neurons acting in parallel and in competition with 
the influences of yet other groups. 

A way of imagining how such sets of changes in a CDH may take place is by considering linguistic elements 
as rer/resentations that exist in multi-dimensional space, and the changes being realignments of clusters of 
elenrents within that space, as we shall see in the next »vction. 

8. A-theory mmd rgMlignmrut of elemenU im Laaguagg/lJayuiitig Spar# 

Culicover and Nowak (1995) set out arguments for a conception of language acquisition based on ^adaption* 
that they consequently term 'A-theoiy*. According to their view, knowledge of language is built up through 
the establishment of representations and links between representations in what they term Linguistic Space, in 
response to frequency of forms in the PLD. 

The links between lexical categories establish permissible trajectories through Linguistic Space. The more 
closely aligned and parallel the trajectories between categories (the categories themselves being clusters, 
ultimately, of representations of lexical items), the more confidence the learner will have in the *rule* 
expressed by the 'envelope* of trajectories. Thus the learner will judge a novel sentence as being 
grammatical if it is composed of lexical items that can be assigned to the clusters of linguistic categories that 
lie along those permissible trajectories. 

For such a system to work, linguistic items with similar properties are taken .to be represented in the same 
region of representational space. Culicover Sc Nowak call this the Local Optimization Principle ; 

A represenutional space tends to self-organize in such a way that elements with similar properties are 
relatively close to one another. 



\n impoiUnt point here is that of ‘self-organization’. If properties of a particular set. A, of represenUtions in 
an area of Linguistic Space presently also occupied by represenUtions B are realised to make A more similar 
to the set of representations X than was realised hitherto, the space will reorganise so as to bring A and X 
together. This will enUil either locating A, B and X together in the same region of space, or moving A to X, 
or moving X to A and relocating B. The choice made will depend on how optimal (in terms of maximising 
harmony/ease of processing) the subsequent alignment of trajectories is, and is clearly analogous to 
parameter setting. If the alignment of trajectories formed by moving A to X proves a better basis for aiulysis 
of experienced daU than grouping A, B and X together, or moving X to A and relocating B, this will be the 
preferred move. It will be preferable in that, in accordance with the PEP. it will provide a more efficient 
configuration in which fewer constraints are violated in processing the daU. 



We can begin to see how multi-dimensional energy sheet topologies relate to an OT view of linguistic 
processing if we imagine that the (initially weak) patterns of activation (the candidate set) created by any 
input will resonate and ultimately settle on a strongly activated pattern that requires least energy to be the 
output of the net, i.e. a pattern that has the greatest harmony with the topology, the one that satisfies the 
constraints that operate in the relevant part of the topology to the greatest extent or violates them minimally 
(Smolensky, 1986; Prince and Smolensky, 1993). The ootimal candidate for outout it therefore the n«ttem of 
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It is Important to remember that nets map inputs to outputs according to the energy topologies they create. 
According to OT, a ‘grammar is a specification of a function which assigns to each input a unique structural 
description or output" (Tesar» 1995: I). Smolensky (1995: I) states that Universal Components ^f 
Grammar'' (my italics) are: 

a. Input Set 

b. Con: Constraint Set 

c. Gen: Candidate Set for each input 

d. H-eval: Formal procedure for evaluating Harmony/optimality 



Smolensky goes on to state: 

OT is a theory of how one level/component of a strurtural description is projected from another: 
optimal satisfaction of ranked and violable constraints. 



As we saw in section 5, the idea of energy sheets reflecting the conversion of inputs to outputs is a good way 
of imagining the effect of distributed soft constraints and other forms of representation within a net. If we 
imagine a more complex situation in which many energy sheets intermingle in the same multi-dimensional 
space, Linguistic Space, we come closer to an image that captures the OT view of inputs creating candidate 
sets of patterns of activation which, by a process of competition, ultimately lead to an output being selected 
which is deemed optimal in terms of constraint satisfaction and therefore grammatical given the CDH or set 
of topologies that applied in the selection process. 

Optimal output candidates will be ones that exhibit the greatest harmony. Different parts of the system will 
come up with optimal candidates, based on input to those parts of the system and the topologies in the 
resolution of Linguistic Space that is reflective of those parts. The optimal candidates will then become the 
input for other parts of the system. We will attempt to portray this schematically in the next section. 

10. What do neu ral aetworka do? 

Neural nets map vectors* in regions of multi-dimensional space (called, alternatively, vector SPICC or sUifi 
space - see Churchland, 1986, though she predominantly uses the term phase SPiCC). Input vectors are 
mapped to output vectors. In cell assemblies pertaining to linguistic processing, many types of vector 
mapping are carried on in parallel. The chain is unlikely to be so simple, but the following elements seem 
likely: 

auditory stimulus li 

creation of candidate set for output in auditory space U 

creation of candidate set in phonetic space (phonetic feature mapping from auditory space) U 

creation of candidate sets in morphemic and thus lexical spaces II 

creation of candidate sets in semantic, syntactic, logical and conceptual spaces 

This will then lead to feedback through the chain to make the whole system settle on a solution (thus making 
all inputs together make sense - comparing the ‘as it appears to be' with what it 'should' be). A more 
familiar way of expressing this process might be as follows: 

O 1 hits eardrum, signals travel to Wernicke's area U 



{ibid.) 
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pottible mofphcme t in the phonolog ic«l/morpholo$ica] spaces aie primed U 
possible lexical items are primed U 

suitability of primed lexical items is checked in semantic, syntz^ :tic and logical areas II 
morphemes are decided upon, given feedback from semantic via lexical areas II 
best choice lexical items are identified 

Production in the form of speech, would presumably involve elements of the following; 

generation of possible output candidates in conceptual, semantic and logical spaces II 
generation of candidate sets in syntactic and lexical spaces II 
generation of candidate sets in morphological and phonological spaces II 
feedback (this probably goes on from the beginning of the process) II 

strong priming of (reduced number oO syntactic, lexical and thus phonological candidates in their respective 
spaces II 

priming (activation of candidate set) in Broca's area for production of sound stream II 

final feedback loops and settling of system, resulting in selection (strong activation) of optimal outputs II 

production (speech) 

This is, of necessity, highly simplified, but gives us a basis for discussion and examination, and is a useful 
starting point for progressive refinement. 

The diagram is a schematic attempt to portray a dynamic system in which alt the constraints exist and in 
which they always play a part, but the sute of the system at any particular moment in time will influence the 
degree to which some constraints are respected or violated. This being the case, there will be subtle 
differences in the CDH which are context dependent. This is not necessarily to be linked only to particular 
lexical items, it is likely that emotional, semantic and pragmatic factors will also play a part, as we shall 
outline in the next section. 

11. O— or laayrDHt? 

The idea of a speaker enteruining a number of CDHs is at first sight problematic in that it would require 
alternate, virtual, patterns of connectivity, but it seems likely that that is precisely how we are able to speak 
more than one language. To what extent we use the same nets for processing two or more languages is still 
an open question, but it is impossible to seriously contemplate the systems for two languages being entirety 
separate. At the very least, we will use the same systems of conceptual and logical space. We wilt use other 
spaces too, to the extent that this is practical. It is possible to imagine a separate lexical space for an L2, but 
it is doubtful, even in the case of lexical items, that total separation is possible. It is inevitable that we will 
strive to find similarities between lexical items in our LI and those in the L2. Every time a similarity is 
discovered, there will be a point of intermingling of LI and L2 spcce. 

Even if we speak two languages quite well, there will be very little mixing of the two in production if we live 
in a setting where we normally only speak one of the languages, so it seems that if the two languages, 
determined by their respective CDHs, do somehow exist in the same space we can automatically restrict 
selection to only one language within a space. How might this be done? One possibility is that intemeurons 
set up virtual patterns of connectivity within a net > in this sense the LI and L2 would be like virtual 
machines run on the real machine of the net. 



Sometimes our selection mechanism relaxes - if, for example, we are trying to communicate with a native 
shaker of the L2 who also speaks our LI. In such circumstances it is not uncommon, and nor is it 
Y aonable, for a degree of mixing to take place. Obviously, this does not mean that we mix 
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indiscriminately - such a policy would quickly lead to a breakdown in communication. It is not a haphazard 
afTair, but something of an ongoing negotiation with both languages being used in an attempt to maximise 
communication (and perhaps also express empathy). 

We tend to stick to one language in a monolingual setting for the simple reason that it would be unreasonable 
for us to express ourselves in a language that our interlocutors didn’t understand. Even in an LI setting, 
however, we rarely speak precisely the same language with everyone we try to communicate with. If I return 
to the region I grew up in, Yorkshire, and meet local people in a pub, I notice not only that my accent 
(rhythm, pronunciation, intonation, speed) changes, but that I also use words and expressions that I would 
not normally use. Does this mean that we entertain a whole panoply of subtly different CDHs? This seems 
rather doubtful in that it would require that we have an alternate pattern of connectivity for every CDH. What 
seems more likely is that the CDH is the (^Kriiaps virtual) topology of the moment and the topology is 
variable in as much as the pattern of connectruty can be influenced by context. In this sense, our CDH is 
constantly under review and being changed by what we have just heard, how we have reacted to it, what we 
have decided we want to express and how, etc. 

If this is true, we do not entertain countless CDHs, i.e. countless CDHs are not physically instantiated in our 
heads, but we possess a system which is able to variably evoke countless variations in energy sheet 
topologies, CDHs, though only one CDH exists at any particular moment. 

Rather than viewing the learner as having a single grammar/CDH, it is perhaps preferable to imagine her 
being in possession of a cluster of potential CDHs, remembering that we are constantly changing our CDHs 
in subtle ways in response to all input processed. We may view such a cluster as a sort of candidate set from 
which to choose, noting that much of any CDH chosen will have a great deal in common with the others. In 
this sense we might imagine the cluster being a set of variable elements of the favoured CDH, any of which 
can be evoked at any particular moment of time. 

!2. CQidluioM 

At first sight, it seems obvious that knowledge must emanate from either the organism (that which is built in, 
i.e. innate), or the environment. Given the poverty of the stimulus argument, the appeal of the UG hypothesis 
is strong, and it would be folly to reject it without determining a mechanism for explaining aspects of a NS’s 
knowledge that cannot be accounted for by positive evidence in the PLD. At the same time, we should not 
presume that such knowledge, if not derivable from the input, is necessarily built in. The third possibility, is 
that the knowledge arises from the interplay between what is built in and the environment. This might be 
termed a feature of the dynamic system that the linguistic system is. 

Such reasoning is unlikely to appeal to those who would like to keep the world conceptually simple - it is 
satisfying to draw clear lines between domains, and to limit domains in number • but we should not ignore 
good evidence, and we ignore the insights derived from machine learning and the neurosciences at our peril. 
11ie fact :s that the real world is not so simple and sometimes refuses to fit into clear conceptual categories 
that are easy for us to grasp. The positive corollary is that the real world, in its complexity, is found to be 
more interesting, and thus even more worthy of investigation. 
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‘ To clarify this, let us use a few symbols to represent such representotions and what they represent. Let’s say that there 
is a speaker and listener and in an exchange the speaker says ‘dog*. Let’s call this sound stream an ErxCD in that It is an 
external represenution (a sign) of what the speaker wishes to convey to the listener. It is a pattern of sound waves in an 
external medium, usually air. The listener creates a representation of this sound stream in response to the pattern of 
movements of his eardrum when the E-rep hits it. This is the word ‘dog*, as it sounds to the listener. Let’s call this h 
repi. with I indicating the outermost level of representation on the pan of the listener. 1 understand that some readers 
may already balk at the idea of this experience (created by the sound stream and the listener’s sense of hearing) being a 
represenution. Such readers feel that the experience is a sound, and not ‘just* a represenution of a sound. However, it is 
evident that the experience (of sound) we have as a listener when we hear a sound, is not the same thing as the (physical) 
sound which causes the experience. Even if we are reluctant to say that the experience is a represenution of the external 
world ‘object* that the sound stream is, we at least have to recognise that sound as a physical entity ouuide our hea^ 
and sound as an experience inside our heads are different entities in a causal chain and exist at different coordinates in 
spacetime. 

(0)So far, we have an F-rep (which we may call **dog”£) and an (“dog”!). Next, the listener identifies the 
phonetic form (which might be the same for different lexical items) ‘dog*. This is the next level of representation, and 
we may call it l-rep2 . Again, some people may balk at this, wondering how the phonetic form of a word can be said to 
represent anything. Let’s say then, that it is a bridge between the raw, in itself meaningless, sense experienw of a 
particular sound and the range of linguistic entities that that sound can represent, which we may call (Ljxpji) The 
linguistic entities will be composed of further seU of representations which we may identify as bundles of conceptt 
peruining to sense experience and categorisation. In the case of the lexical item dsg <u • noun referring to the canine), 
these will include such things as ‘canine*, ‘animate*, ‘faithful*, ‘potentially dangerous* together with a range of visual, 
auditory, olfactory and tactile qualities based on the individual’s experience of what dogs look, sound, smell and feel 
like, as well as linguistic features such as ‘able to take agent or patient rotes*. Such concepts lie at a level of 
representation that underpins l-rep3 and we may call it LiCIbl- 

Thus we may think of entities that exist at Erigp and l-repl^J as elemenU in a causal chain of association, with any 
element in the chain being such that it is capable of evoking the next. This is what representations do. We may describe 
this schematically: 

ProducIloR Comprehensloa 

Speaker sound stream Listener 



l-rep4 l-rep3 ■> !-rep2 1-rcpl E-rep ■> 



l-repl l-rep2 ■> l-rep3 ■>l-re p4 
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^ Such tnpi are evidently avoided in normal LI acquisition, but may play a part in accounting for fossilisation in L2 
acquisition. 



’ Genetic Algorithms are a heuristic search method involving the ‘breeding’ of a population of candidate networks, with 
individual networks competing with each other on the basis of ’fitness’ (their relative success in converting inputs to 
appropriate outputs) for the chance to ’survive* and ’breed’ and thus create the next generation of candidate networks. 

* Intemeurons, as their name suggests, stand between other neurons and affect the connection between them. What I 
really have in mind is the sort of activity carried out by what Rumelhart and McClelland call ’conjuncts’ or ’gatctl pairs' 
(see Rumelhart and McClelland, I9S6: 73/74) in which there is a branching connection between two neurons, A and B 
which joins before reaching a third neuron C, that also receives an input from a further neuron, D. We can simulate the 
activity of such a branching connection if we multiply the outputs of A and B. We can then calculate the net input to C 
by adding this product to any other inputs C receives from other units. For example, if C*s input from the other unit, D, 
is always + 'i (excitatory), if A’s output is I, and B’s 0, C’s net input will be I, and if A’s input is I and B’s is also I, C’s 
net input will be 2. Without the branching connection, i.e. if A and B were each directly connected to C, the net input to 
C would have been a simple sum of the inputs from A, B and D, giving 2, (I +0+1), and 3, (1 + 1 + 1), instead of 
l.((l•0)+l), and 2, ((l*l)+l). A link such as that between A and B is called a gate, and the pair of units linked by the 
gate are referred to as co gjuncts (Rumelhart, Hinton and McClelland, 1986: 73) the units as a whole are called Signia-Pi 
units . 

’ Vectors, in this sense, are elements of vector space. Mathematically, a vector is a quantity that has magnitude and 
direction. For example, I can map certain qualities of someone I know by placing a point at a certain distance along a 
tine that represents that quality. Given three qualities, such as kindness, appearance and intelligence, we can place a 
single point inside a cube that represents the degree to which the person is kind, good-looking and intelligent, the axes 
(dimensions) representing each of the qualities in turn. If we take this point and map it to another vector space, say one 
describing attractiveness (if we assume that this depends on how kind, good-looking and intelligent a person is), we have 
performed the function of a neural net. In this case, the input vectors arc kindness, appearance and intelligence, and the 
output vector is, say, attractiveness. 
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